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Abstract 

A wide variety of information is represented by an image feature, which may be applied in various applications 
such as mage fusion, video processing, medical diagnosis, traffic safety monitoring, visual surveillance, 
feature matching, image segmentation, pattern matching, person identification, sentiment analysis, human 
computer interaction and many more fields. This paper focused on multi-temporal feature extraction of eyes 
feature from human face images captured in two different time slots to reduce the semantic gap from images 
and to improve the image quality by image fusion using PCA, SWT and hybrid approach of PCA and SWT 
algorithm. In this paper, comparative study of multi-temporal image fusion using Principle Component 
Analysis (PCA), Stationary Wavelet Transform (SWT) algorithms and hybrid approach of PCA and SWT are 
employed and its experimental results are evaluated with its performance analysis. Image fusion performance 
is compared based on eight quantitative quality measures as SSIM, MSE, NAE, CC, SC, AD, SD and MI.The 
outcomes of comparison show that employing the hybrid approach of PCA+SWT transform can improve image 
fusion performance. The applicability of this work approach may have several uses when the utilization of 
human facial features is feasible. 

Keywords: Feature Extraction; Semantic Gap, Image Fusion 


1. Introduction 


An image is a vast source of information that 
precisely represented by features which represents 
distinguishing characteristics of an image 
containing behavior of an image in the form of 
texture, density, color, shape and brightness inside 
the image. These feature categorizes under the type 
of pixel level feature, domain specific features, local 
features and global feature of an image. Therefore, 


identification of accurate feature and extraction of 
appropriate feature is an important aspect in image 
processing. [1,2] The significant challenges in 
appropriate image information retrieval includes 
diverse aspect of image, database structure, noise 
existing in an image, availability of images from 
different perspectives, image quality and human 
perspective. [3,4,5] Hence, to retrieve correct image 
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information, some parameters has to be consider 
like selection of appropriate feature, choice of 
feature extraction technique, constraint set at the 
time of feature extraction. As per the requirement of 
application it may vary upto some extent but the 
outcome of these mainly focus on the main 
objectives of feature extraction in image processing 
are error reduction, obtaining higher accuracy and 
better visual perception. While using these extracted 
features, there exist some semantic gap which affect 
on the quality of the image. For better image 
quality, semantic gap should be minimum. Image 
fusion is one of the solution to reduce the semantic 
gap exist in the image, in which it combines the 
feature information of two different sources 
captured of the same object for better perception and 
image quality.[4,6,7,8] 

2. Objective 

An image is a vast source of information that 
precisely represented by features which represents 
distinguishing characteristics of an image 
containing behavior of an image in the form of 
texture, density, color, shape and brightness inside 
the image. These feature categorizes under the type 
of pixel level feature, domain specific features, local 
features and global feature of an image. And hence 
identification of accurate feature and extraction of 
appropriate feature is an important aspect in image 
processing. [1,2] The significant challenges in 
appropriate image information retrieval includes 
diverse aspect of image, image database structure, 
noise existing in an image, availability of images 
from different perspectives, image quality and 
human perspective. [3,4,5] Hence, to retrieve 
correct image information, some parameters has to 
be consider like selection of appropriate feature, 
choice of feature extraction technique, constraint set 
at the time of feature extraction. As per the 
requirement of application it may vary upto some 
extent but the outcome of these mainly focus on the 
main objectives of feature extraction in image 
processing are error reduction, obtaining higher 
accuracy and better visual perception. While using 


these extracted features, there exist some semantic 
gap which affect on the quality of the image. For 
better image quality, semantic gap should be 
minimum. Image fusion is one of the solution to 
reduce the semantic gap exist in the image, in which 
it combines the feature information of two different 
sources captured of the same object for better 
perception and image quality.[4,6,7,8] 
3. Feature Extraction 
Feature extraction is a process of retrieving or 
extracting appropriate feature from image to obtain 
more accurate information of an image. Face feature 
extraction is also one of the demanding area in many 
applications like person identification, sentiment 
analysis, human computer interaction and image 
fusion. Eyes, nose and mouth features are the most 
prominent features in face images. Any 
informational segment will only be valuable if it 
effectively and clearly conveys the actual content 
with clarity and accuracy. There are many factors 
that may effect on the accuracy of these facial 
feature detection such as —[9,10,11,12] 

i) Position of The Person from Camera 

il) Lighting Effect 

ili) Distance from The Camera 

iv) Angle Between Camera and Person 

v) Zoom Setting of the Camera 

vi) Background of The Object Etc. 


Due to these factors, there is a chance of semantic 
gap may present in the image. Semantic gap is the 
difference or variation between user’s high-level 
understanding of an image and information derived 
from image’s low level feature properties. The 
existence of the semantic gap is due to the different 
image capturing conditions including 
multiresolution aspect, multimodality image, multi- 
focused image, variations in image capturing time 
as well as the effect of the background, noise, 
elimination, distance, angle etc. The minimal 
semantic gap between images indicates higher 
image quality and hence the demand of various 
image processing applications is the minimal 
semantic gap between images that indicate higher 
image quality which is used for better interpretation 
of image.[8,9,13] This discrepancy between 
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information extracted from visual data and the user's 
understanding of same data in a given circumstance 
can be decreased through fusion of two or more 
images. 

4. Image Fusion 

Image fusion combines significant information from 
various image sources into a composite image that 
is more informative and accurate than either of the 
original input images with minimal data loss or 
distortion. It produces the fused image without any 
redundancy or artifacts to make image more robust. 
The fused process should be suppress irrelevant 
features that can distract or mislead any subsequent 
image processing steps. Due to limitations like 
optical constraints, poor quality image capturing 
and lack of clarity and quality with a single image 
sensor, the demand for image fusion for image 
processing applications has been grown 
exponentially. The image fusion process must 
maintains spatial information and improve the 
visual quality of fused image and must retain all 
useful and relevant information. [14,15,16] 

5. Literature Review 

A significant research works are employed by 
researchers on image fusion to combine information 
from two or more images to get more precise and 
exact information about the image. Image fusion in 
face detection application is complicated due to 
challenges of face detection. Many researchers are 
also performed experiments with various modalities 
to improve face detection accuracy in an 
unconstrained environment. Several methods are 
existing for face detection includes Viola and Jones 
face detection, Convolutional Neural Network 
(CNN), Edge Orientation Matching and Support 
Vector Machines (SVM) etc. The most popular and 
commonly used face detection method is Viola and 
Jones face detection algorithm that performs better 
with high accuracy in result upto 93.7 % [17]. The 
existing image fusion techniques categorized in 
spatial domain, transform domain and statistical 
domain. Based on the level of data fusion that 
occurs, image fusion techniques are classified as 
pixel-level, feature-level, and  decision-level. 
Depending on the process and data sources, image 
fusion techniques are classified as, Multiview 


fusion, Multimodal fusion, Multitemporal fusion, 
and Multifocus fusion. [18] Spatial domain method 
directly deals with the pixel value of an image that 
are manipulated to achieve the required result. 
Typical spatial-domain fusion method includes 
principal component analysis (PCA), Intensity Hue 
Saturation (IHS), averaging method, weighted 
average method, brovey method, maximum and 
minimum method etc. This domain uses a series of 
fusion rules to perform direct selection of 
appropriate pixels, blocks, or regions from source 
images in order to compose a fused image without 
performing any transformation. This approach is 
simple to execute and takes little time. This method 
may affects by blurring edges, reduced contrast and 
reduction of sharpness which has a significant 
impact on contrast of the image. [19,20,21,22] 
Shumin et. al. proposed hybrid approach of multi- 
focus image fusion by focused area decision map 
and DWT under spatial and transform domain 
respectively. It produced fused image with lower 
complexity of algorithm and high quality[20] The 
discrete cosine transforms (DCT) based image 
fusion methods are more suitable and time-saving 
[25] but it also produces blocking artifacts. This 
problem can be covered by wavelet transform. The 
spatial domain-based methods can obtain excellent 
results, but it create undesirable block artifacts and 
reduced contrast. It also suffers from limitation of 
fusion rule and often produce unwanted artifacts at 
the boundaries between focused and non-focused 
regions. These limitations of spatial domain are well 
handled by transform domain and it may reduce the 
artifacts to some extent.[23,24,26] In transform 
domain ( or frequency domain ), the pixel value is 
first transferred into frequency domain by applying 
fusion methods and further alters its frequency 
component. [27] In this transform domain, source 
images are transformed from the space domain to 
some other domain by using acceptable transforms 
like wavelets or pyramids. The source images are 
decomposed into a series of levels or multiscale 
coefficients depending on transform coefficients. 
[21,28] Then fuse the corresponding coefficients or 
sub-images by applying suitable fusion rule on 
them. Finally fusion decision map ( fused image ) 
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obtained by performing inverse transform on it 
reconstruct the original image. This fused image 
preserves all of the feature information of source 
images while reducing spatial distortion. [21, 29] 
The wavelet-based image fusion technique includes 
Discrete wavelet transforms (DWT), Stationary 
wavelet transforms (SWT), Multi-wavelet 
transforms (MWT) and Complex wavelet transform 
(CWT). [29] Sanjukta Bhattacharya proposed his 
work on partial face recognition using image fusion 
based on transform domain by combination of 
averaging and DWT method to detect face with 
uniform lighting and same background of the image 
to detect eyes from the image. It proved acceptance 
rate of face recognition between 86.67% and 87.5%. 
[27] Tanmay Rajpathak also worked on eyes feature 
detection and given result of successful eye 
detection with 90% from frontal face images but it 
failed to detect eyes are closed in the image. [28] 
Debotosh Bhattacharjee also worked on thermal and 
visual image fusion for human face recognition in 
semi-uncontrolled environment with moderate 
condition of pose, disguise, illumination and 
occlusions on IRIS database. The image fusion 
performed using DWT wavalet, PCA and multi- 
layer perceptron using two fusion rules maximum 
and weighted avearage for high frequency and low 
frequency components respectively. It showed 
improved recognition rate 98.36% and 95.77% on 
IRIS database and face database respectively [29- 
31] R. Raghvendra worked for face recognition 
application using DWT transform domain image 
fusion. Among the few of these existing research on 
face features, DWT gives better performance of face 
feature detection from visible images. But DWT 
transform domain has limitations of blurry edges 
and it provides insufficient details. As a solution of 
these problems of DWT, Stationary Wavelet 
Transform (SWT) technique has been proposed 
which is the extension of DWT. [7,20] With all 
above literature review, we observed that very few 
research work on multi-temporal face image fusion 
and hence we considered these this factor for our 
research work of image fusion [32]. 

6. Proposed Research Work 

In this paper, we focused the image fusion of multi- 
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temporal images to improve the image quality and 
to reduce semantic gap between images when it 
captured in different time slot. The research work 
proposes a multi-temporal image fusion system of 
eyes facial feature trait based on content-based 
image retrieval and feature-level image fusion under 
spatial and transform domain to reduce semantic 
gap present between two different source images 
and for better quality image. At the initial stage of 
the experiment, the primary database is created with 
the parameters as - time duration of image capture. 
By considering these parameter, the primary 
database is created with 70 images of 35 individuals 
in multi-temporal way under the two different 
session of one-month gap between two sessions 
[33]. As discussed above, there are various factors 
that may affect on the accuracy on the feature 
detection, so we considered some _pre-fixed 
parameters for our experimental work while 
capturing these images as- 

1) Position of the person - Front position to the 

camera 

i) Lighting effect - Natural light at day time. 
1i)Distance of object from camera and angle 


between camera and person - 5ft in 
perpendicular (90 degree angle between camera 
and person) 

iv)Background of the object etc. - Natural 


background in the room. 

v) Facial expressions of all objects are almost try to 
keep neutral without any special expressions and 
gestures. 

Only one factor that considered with this 

experimental work is two different time slots while 

capturing the image [34-37]. Therefore, these 
images are captured in two different time slots with 
one month duration gap in two time slots. In this 
way, we have captured 2 images per person (object) 
and the analysis performed on total 70 images (35 
images * 2 sessions). In second step, the 
preprocessing is performed on images in which 
original images are converted into grayscale, 
removed noise using median filter, contrast 
adjustment, noise removal using median filter and 
resized with 10% of the original image size for 
feature detection and extraction. In this step, the 
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eyes facial features are detected and extracted from 
these images using Viola Jones feature detection 
algorithm of source images captured in two different 
timeslots. We used this algorithm because it 
produces superior results with greater accuracy from 
still images when detecting the upper body and 
because its Haar-like features resemble the facial 
features of the face in some way by effectively 
utilizing an integral image to extract eyes feature. 
These features are characterized into Type 1, Type 
2, Type 3 and Type 4. [2,9] To detect eyes feature 
from the image, Type 1 and Type 2 Haar features 
are used. Eyes feature detection performed on these 
35 individual images using Type 1 and Type 2 Haar 
feature of Viola Jones algorithm. After feature 
detection, images are cropped and normalized for 
feature extraction with size 160*180 as 
width*height before fused it. These extracted eyes 
features from two source images then fused under 
two domains — spatial domain and transform domain 
to analyze the image quality. PCA algorithm with 
maximum and average fusion rule from spatial 
domain and wavelet-based SWT algorithms with 
level-1 decomposition are used under transform 
domain to fuse these multi-temporal images. Fig. 1 
shows, sample images of image fusion of eyes 
feature using PCA, SWT and Hybrid Approach of 
(PCA+SWT). These images are captured in two 
different time slots with one month duration said as 
Session I and Session II images. The resultant fused 
image is comparatively analyzed with two separate 
algorithms of PCA and SWT with result of hybrid 
approach (PCA+SWT) and also evaluated with 
eight quantitative image quality measures such as, 
Structural Similarity Index Matrix (SSIM), Mean 
square error (MSE), Normalized Absolute Error 
(NAE), Correlation Coefficient (CC), Structural 
Content (SC), Average Difference (AD), Standard 
Deviation (SD) and Mutual Information (MI). Table 
1 shows eyes feature fusion result using PCA, SWT 
and hybrid approach of PCA+SWT with quality 
measure. The system implemented in MATLAB 
and experimental results demonstrate that hybrid 
approach using both spatial domain and transform 
domain (PCA+SWT) achieves better result of fused 
image than other compared algorithms, even though 


images are captured in two different time-slots from 
the camera irrespective of background and 
illumination setting. 

The performance of hybrid approach of spatial 
domain based PCA and transform domain wavelet- 
based SWT algorithm gives highest feature 
recognition rate with reduced semantic gap. It is 
found to be efficient for image fusion of multi- 
temporal images with reliability, accuracy of feature 
detection and reduced semantic gap against the 
appearance variations in lighting and background 
condition. 

7. Result and Discussions 

The outcome of this experiment clearly shows that 
Viola Jones algorithm is successful in extracting 
eyes feature from these images. Three different 
image fusion algorithms i.e PCA, SWT and 
PCA+SWT are applied on these extracted eyes 
feature to improve the quality of image and get 
composite image that is more informative and 
accurate than either of the original input images with 
reduced semantic gap and minimal data loss or 
distortion. Figure 1 shows resultant images of image 
fusion of eyes feature using spatial domain based 
PCA, transform domain based SWT and hybrid 
approach of PCA+SWT. Image quality of fused 
image can be evaluated through either subjective 
and/or objective approach. Subjective methods are 
based on the perceptual judgement of a human 
viewer about the characteristics of an image 
whereas objective methods are based on 
computational models that can predict perceptual 
image quality. These measures are in the form of 
qualitative or quantitative. The qualitative measures 
are time consuming, requires careful control of 
viewing conditions and subject equivalence to 
render meaningful results whereas, quantitative 
evaluation methods are beneficial for effective 
assessments. Table 1 shows resultant values of 
quantitative image quality measures after image 
fusion using PCA, SWT and PCA+SWT image 
fusion algorithm. Table 2 shows summarized result 
of image quality improvement after image fusion 
and figure 2 shows graphical representation of 
image fusion result improvement using PCA, SWT 
and PCA+SWT with quality measure. 
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Figure 1 Sample Images of Image Fusion of Eyes Feature Using PCA, SWT and Hybrid Approach of 
(PCA+SWT 
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Table 1 Eyes Feature Fusion Using PCA, SWT Table 2 Summarized Result of Image Quality 
and Hybrid Approach of PCA+SWT with Improvement After Image Fusion Using PCA, 
“Yes Measure SWT and Hybrid Approach of PCA +SWT 


a | ; 
ES Quality 
a | SwT Measur | PCA | SWT — ay 
e 
MISE ooner 0p? MU noe? SSIM 0.6002 0.7199 0.7251 


MSE | MSE2 | 0.5393 | 0.1665 | 0.1668 MSE | 0.6106 | 0.1605 | 0.1604 

NAE | 0.6089 | 0.4650 | 0.4653 

= on CANDO | OF O08 cc. | 0.4980 | 0.7317 | 0.7311 

von ssimi | 0.4484 | 0.4484 | 0.4484 SC. | 0.7288 | 0.7288 | 0.9260 
SSIM 


AD | 0.3168 | 0.1585 | 0.1583 
ssim2 | 0.7394 | 0.7763 | 0.7827 ae eae 
Ssim3 | 0.6002 | 0.7199 | 0.7251 MI | 0.9580 | 0.9583 | 0.9583 

~ NAE1 | 0.7398 | 0.6089 | 0.6089 
NAE 


Image Fusion Result with Quality 
Measures 


cc 


1.2000 


1.0000 


0.8000 


Ean a ee es 


0.6000 

0.4000 

0.2000 | 
0.0000 I I 


SS vey SDL 


BPCA B@SWT B&PCA+SWT 


Measurement 


0.5086 | 0.4359 | 0.4363 
0.5134 | 0.5065 | 0.5086 oe 
0.6015 | 0.6015 | 0.6015 The experimental work is performed on 35 primary 
photos taken with a smartphone camera in two 
0.5900 | 0.2077 | 0.1809 different sessions with one month duration gap for 
i dy of image fusion results usin 
ee eee eee PCA, SWT and PCA+SWT algorithm on multi- 
0.9580 | 0.9583 | 0.9583 temporal images. It has been observed from 


outcomes of analysis - 
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1) Mean Squared Error (MSE), Normalized 
Absolute Error (NAE) and Average Difference 
(AD) these three quality measures which 
indicates that how far the fused image from the 
original image. Here, MSE shows the error 
between two images and AD gives overall 
average difference between the corresponding 
pixels of the two images. Lesser the value of 
these three parameters shows the better 
performance of fusion. These values are shown 
results in the range of O-1l. From this 
experimental result lesser values of NAE and AD 
using hybrid approach of PCA+SWT indicates 
better quality of fused image as compare to the 
PCA and SWT algorithm seperately. 

2) The resultant fused image is also evaluated with 
five quantitative image measures Structural 
Similarity Index Matrix (SSIM), Coefficient 
Correlation (CC), Structural Content (SC), 
Standard Deviation (SD) and Mutual Information 
(MI) that shows the structural similarity between 
source image and fused image. Here, MI 
estimates the amount of information conveyed 
from the both source images to the fused image. 
Higher the values of these five parameters shows 
the better performance of image fusion. This 
experimental result shows higher values of these 
image measures indicates better quality 
improvement in fused image using hybrid 
approach of PCA+SWT. 

9. Future Scope 
The experiments can be analyzed on different types 
of images such as .gif, .bmp. This experimental 
work can be extended for face detection using 
another face feature from more different distances 
as well as it can be carry out on images with 
different angles, human pose and facial expressions 
for exploring more application and accuracy in the 
system. 
Conclusion 
The aim of this research are considering different 
aspects of image fusion for improving spatial 
resolution, accuracy of images, enhanced 
capabilities of feature display, reducing semantic 
gap and visual interpretation of images captured at 
multi-temporal way. 


It is concluded that, the result analysis of image 
fusion clearly shows that hybrid approach of 
PCA+SWT image fusion techniques reduces the 
error in fused images and increases the structural 
and content similarity in multi-temporal images. 
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